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[57] ABSTRACT 

A method and apparatus are described for detecting double- 
talk in an acoustic echo canceller. The present invention 
examines the spectral characteristic of the near-end audio 
signal and the spectral characteristics of the far-end audio 
signal and determines from the comparison if a condition of 
doubletalk exists. An exemplary implementaion of the 
present invention is presented in an acoustic echo canceller 
wherein the adaptation of the adaptive niter taps is inhibited 
during periods of doubletalk. 
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DOUBLETALK DETECTION BY MEANS OF 
SPECTRAL CONTENT 

This is a Continuation of application Ser. No. 08/535, 
365, filed Sep. 28, 1995, now abandoned, which is a 
continuation of application Ser. No. 08/202,521, filed Feb. 
28, 1994, now abandoned. 

BACKGROUND OF THE INVENTION 

I Field of the Invention 

The present invention relates to echo cancellation. More 
particularly, the present invention relates to a novel and 
improved method and apparatus for determining a double- 
talk condition in an echo canceller. 

n. Description of the Related Art 

Acoustic echo-cancellers (AEC) are used in teleconfer- 
encing and hands-free telephony applications to eliminate 
acoustic feedback between a loudspeaker and a microphone. 
In a cellular telephone system where the driver uses a 
hands-free telephone, acoustic echo cancellers are used in 
the mobile station to provide full-duplex communications. A 
block diagram of a traditional acoustic echo canceller is 
illustrated in FIG. 1. 

For reference purposes, the driver is the near-end talker 
with input speech signal v(n) and the person at the other end 
of the connection is the far-end talker with input digital 
speech signal x(n). The speech of the far-end talker is 
broadcast out of loudspeaker 2 in the mobile. If this speech 
is picked up by microphone 10, the far-end talker hears an 
annoying echo of his or her own voice. The output of 
microphone 10, r(n), is a digital signal. Typically the func- 
tions performed by microphone 10 may be accomplished by 
a microphone, which would convert the audio signal to an 
analog electrical signal and an analog to digital (A/D) 
converter. The AEC identifies the impulse response between 
speaker 2 and microphone 10, generates a replica of the echo 
using adaptive filter 14, and subtracts it in summer 12 from 
the microphone output, r(n), to cancel the far-end talker echo 
y(n). Since the adaptive filter cannot generally remove all of 
the echo, some form of echo suppression provided by 
residual echo suppression element 18 is typically employed 
to remove any residual echo. 

In FIG. 1, the far end talker echo signal y(n) is illustrated 
as the output of an acoustic echo path element 4, which is an 
artifact of the proximity of the loudspeaker 2 and micro- 
phone 10. To the far end talker echo signal y(n) is added 
noise signal w(n) and near-end speech signal v(n), illustrated 
by summing elements 6 and 8 respectively. It should be 
noted that summing elements 6 and 8 and acoustic echo path 
4 are artifacts of the mobile environment and are presented 
for illustrative purposes. 

Since adaptive filter 14 uses the far-end speech x(n) as a 
reference signal, it cannot possibly cancel the near-end 
speech because in general, v(n) is uncoxrelated with x(n). If 
adaptive filter 14 is allowed to adapt in the presence of v(n), 
the near-end speech will be added to the error signal e(n), 
which drives the filter tap coefficient adaptation, corrupting 
the estimate of acoustic echo path 4. It is therefore necessary 
to disable coefficient adaptation when both talkers are 
speaking, a condition referred to as doubletalk. During 
doubletalk, residual echo suppression element 18 must also 
be disabled to prevent corruption of the near-end speech. 
Doubletalk detector 16 detects the presence of doubletalk 
and provides control signals to adaptive filter 14 and residual 
echo suppression element 18 when double talk is present 

Doubletalk detection is the most critical element in any 
acoustic echo canceller. In contrast with network echo 
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cancellers, which can monitor the fairly constant loss 
between x(n) and r(n) to gain information about whether 
near-end speech is present, acoustic echo cancellers do not 
have this property. Since the analog speaker volume control 
5 is under the control of the driver, the volume can be changed 
to any desired level at any time. The volume can even be 
shifted so high as to produce a gain between speaker and 
microphone. The microphone position may also change at 
any time. 

10 Traditionally, doubletalk detection in acoustic environ- 
ments is accomplished by monitoring the echo return loss 
enhancement (ERIE), which is denned as: 

ERLE(6B) = 10 • hg(a y 2 /o^ (1) 

15 where o y 2 is the variance of the echo signal y(n) and c. 2 is 
the variance of the error e(n). The variances c y 2 and o e are 
estimated using short-term energy measurements of r(n) and 
e(n) respectively. The ERLE measures how much energy is 
being removed in summing element 12. Classical doubletalk 

20 detectors declare that near-end speech is present if the ERIE 
falls below some preset threshold such as 3 or 6 dB. 

This doubletalk detection method is highly unreliable, 
especially in high-noise environments, for several reasons. 
First, this method requires the adaptive filter 14 to be 
converged before the ERLE can provide any valid informa- 
tion. In a noisy environment like a car, adaptive filter 14 may 
not converge at all, or may converge extremely slowly, due 
to the noise and the long filter length required to model the 
acoustic channel. Second, the ERLE is highly variable 
because adaptive filter 14 can only approximate the echo 

30 channel due to the noise. The detection scheme therefore 
produces many false doubletalk detections. Third, a change 
in the impulse response of the echo path also produces a loss 
in ERLE. If people are moving within the mobile 
environment, or the microphone changes its position, the 

35 ERLE will drop, causing a false doubletalk detection. 

SUMMARY OF THE INVENTION 

The present invention is a novel and improved method 
and apparatus for detecting doubletalk. This newly proposed 
method for doubletalk detection measures and compares the 
spectral content of the far-end reference signal x(n) and the 
received signal r(n). The unknown acoustic echo channel is 
modeled as a linear time-invariant (121) system. Although 
the unknown channel may in actuality vary with time, it 
changes slowly enough mat the adaptive algorithm is able to 

45 track it, therefore permitting use of this model. A useful 
property of LH systems is that they do not create any new 
frequencies. That is, if the input to an LTI system consists of 
frequencies A, B, and C, the output of the system must 
contain scaled replicas of these 3 frequencies. No new 

SO frequencies may be present at the output if the system is 
linear. 

Through the Fourier transform, both the far-end reference 
signal x(n) and the received signal r(n) can be represented as 
a sum of complex exponentials. Since the received echo 

55 signal at the microphone sounds like the original far-end 
signal, the frequency components that are large in the 
received signal must also have been large in the reference 
signal. If there are large peaks in the received signal that are 
not present in the reference signal, then these peaks were not 

60 caused by echo. Therefore, by comparing the frequency 
peaks between the reference and received signal, it can be 
determined whether near-end speech is present, even with- 
out knowledge of the unknown echo channel. 

55 BRIEF DESCRIPTION OF THE DRAWINGS 

The features, objects, and advantages of the present 
invention will become more apparent from the detailed 
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description set forth below when taken in conjunction with Periods of silence are detected by silence detector 88 which 

the drawings in which like reference characters identify enables noise spectrum averaging element 90 during 

correspondingly throughout and wherein: detected periods of silence. Noise s P c ™^^f N ^ 

. ' , .7 c ^ ment 90 provides averaged noise magnitude spectrum lN(k)l 

FIG. 1 is a block diagram of a traditional acoustic echo ^ to ^ ^ ^ ^ av ^ aged noise magnitude 

canceller; spectrum lN(k)l is subtracted from the received magnitude 

FIG. 2 is a block diagram of the acoustic echo canceller spectrum IR(k)l. The absolute value of the difference is 
of the present invention; and determined in magnitude element 84 to obtain the noise- 

FIG. 3 is a block diagram of the doubletalk detection suppressed received magnitude spectrum IS(k)l. 
apparatus of the present invention. 10 The magnitude components of the far-end speech spec- 

trum IX(k)l are weighted in multiplier 72 by G k , where G* is 
DETAILED DESCRIPTION OF THE a frequency-dependent scalar that estimates the echo chan- 

PREFERRED EMBODIMENTS nel magnitude response for that frequency. The output of 

multiplier 72, GjtIX(k)L is provided to summer 74 where it 
Referring to FIG. 2, in the preferred embodiment, the is subtracted from the noise-suppressed received magnitude 
frequency representations of x(n) and r(n) are obtained using 15 spectrum IS(k)l. This difference is compared to the product 
the Fast Fourier Transform(FFr) ; a fast implementation of 0 f constant C (C<1) and IN(k)l. with the maximum of the two 
the Discrete Fourier Transform(DFI7 the implementation of chosen to form magnitude spectrum IT(k)l in selection 
which is well known in the art X(k) and R(k) are sets of element 76. By using C*IN(k)l as a lower bound, it is 
frequency components of x(n) and r(n) respectively, where ensured that each frequency component has a positive 
the lengths and frequency spacings of X(k) and R(k) are 20 contribution toward rT(k)I.The energy of IT(k)l is computed 
determined by the order of the transform. in energy computation element 80 by ParsevaTs theorem, 

The far-end speech signal x(n) is provided to loudspeaker where N represents the order of the FFT: 
30 and FFT element 44. The far-end speech signal x(n), is 

broadcast out of loudspeaker 30 into acoustic echo path 32 Et= l w £ l P) 

which provides echo signal y(n). Noise signal w(n) and 25 N *=° 

near-end speech signal v(n) are added to echo signal y(n) ^ ^ energ y excee ds some predetermined threshold as 
illustrated in summers 34 and 36 respectively. Again it compared with the average background noise energy, 
should be noted that summers 34 and 36 and acoustic echo doubletalk is declared. 

path 32 are artifacts of the mobile environment and are The coefficients G k can be computed by several means. If 
presented for illustrative purposes. The sum of echo signal adapt i ve filter 46 has converged, they can be estimated by 
y(n) noise signal w(n) and near-end speech signal v(n), is finding the magnitude spectrum of the impulse response of 
provided to microphone 38. The output of microphone 38 is ^ a^p^ filter. In a noisy situation where the filter has not 

converged, these coefficients can be approximated by time- 

The far-end speech signal x(n) is provided to FFT element 35 averaging the quotient IS(k)l/IX(k)l for large components of 
44 which determines the frequency representation of the x(k) when doubletalk is not declared. That is, for each frame 
far-end speech signal, X(k). The output of microphone 38, 0 f n samples corresponding to a set of N frequency com- 
r(n), is provided to FFT element 40 which determines the ponents X(k), only estimates of G* for the largest frequency 
frequency representation of the microphone output, R(k). peaks in IX(k)l are updated and the other coefficients are left 
The frequency representations are provided to doubletalk ^ unchanged. This gives a more accurate estimate in the 
detection element 42 which compares the two signals and presence of noise. The method and apparatus described in 
determines if doubletalk is present If doubletalk is deter- the exemplary embodiment for the detection of doubletalk is 
mined to be present, then doubletalk detection element 42 equally applicable to the detection of near-end only speech 
provides a control signal to adaptive filter 46 to curtail and far-end only speech conditions, 
adaptation of filter tap values. If doubletalk is determined to 45 The previous description of the preferred embodiments is 
be present, then doubletalk detection element 42 also pro- provided to enable any person skilled in the art to make or 
vides a control signal to residual echo suppression element use the present invention. The various modifications to these 
50 to curtail its operation. embodiments will be readily apparent to those skilled in the 

Adaptive filter 46 estimates the echo signal in accordance art and the generic principles defined herein may be applied 
with the far-end speech signal x(n) and the error signal e(n). 50 to other embodiments without the use of the inventive 
The estimated echo signal ?(n) is subtracted from the output faculty. Thus, the present invention is not intended to be 
of microphone 38, r(n),in summer 48. The output of summer limited to the emrK>diments shown herein but is to be 
48 is the error signal, e(n), which is provided to residual accorded the widest scope consistent with the principles and 
echo suppression element 50 where additional echo sup- novel features disclosed herein, 
pression takes place. 55 I claim: 

In FIG. 3, doubletalk detection element 42 is shown in 1. An apparatus for detecting doubletalk comprising: 
further detail. Doubletalk detection is performed in the a first transform element having an input for receiving a 
frequency domain. The respective spectral components X(k) far-end signal and having an output; 

and R(k) are converted into polar form by polar conversion a second transform element having an input for receiving 
elements 70 and 92 respectively to obtain their respective a near-end signal and having an output, the near-end 

magnitude components IX(k)l and IR(k)l. The received car signal including an uncancelled echo component; 

noise is suppressed in noise suppression element 82 to a detector having a first input coupled to said first trans- 
prevent spurious noise frequency peaks from being inter- form element output and a second input coupled to said 
preted as doubletalk. second transform element output for detecting a 

In noise suppression element 82, the noise is suppressed 65 doubletalk condition in accordance with a signal pro- 
by low-pass averaging of the noise spectrum in noise vided by said first transform element and a signal 

spectrum averaging element 90 during periods of silence. provided by said second transform element; and 



10/29/2003, EAST Version: 1.4.1 



5,7: 

5 

an adaptive filter coupled to the detector, the adaptive 
filter configured for adapting filter tap values, all adapt- 
ing of filter tap values being prevented when the 
detector detects a doubletalk condition. 

2. An echo canceller comprising: 

first transform means for receiving a far-end audio signal 
and transforming said far-end audio signal to a fre- 
quency representation of said far-end audio signal in 
accordance with a predetermined transform format; 

second transform means for receiving a near-end audio 
signal including an uncancelled echo component and 
transforming said near-end audio signal including the 
uncancelled echo component to a frequency represen- 
tation of said near-end audio signal in accordance with 
a predetermined second transform format; 

detection means for receiving a first signal representative 
of said frequency representation of said far-end audio 
signal and a second signal representative of said fre- 
quency representation of said near-end audio signal and 
for comparing said first and second signals with each 
other and selectively providing a doubletalk signal in 
accordance with said comparison; 

adaptive filter means for receiving said far-end audio 
signal and said doubletalk signal, for generating an 
estimated echo signal in accordance with said far-end 
audio signal and a set of adaptive filter parameters, and 
for adapting said set of adaptive filter parameters only 
when said doubletalk signal is absent; and 

echo removal means for receiving said near-end audio 
signal and said estimated echo signal and subtracting 
said estimated echo signal from said near-end audio 
signal 

3. The apparatus of claim 2 further comprising a residual 
echo suppression means for receiving an echo residual 
signal and suppressing remaining echo in said echo residual 
signal in accordance with an echo suppression format 

4. An apparatus for detecting doubletalk, comprising: 
first transform means for receiving a far-end audio signal 

and for transforrning said far-end audio signal to a 
far-end frequency representation of said far-end audio 
signal in accordance with a predetermined first trans- 
form format; 

second transform means for receiving a near-end audio 
signal including an unremoved echo component and a 
noise component and for transforming said near-end 
audio signal to a near-end frequency representation of 
said near-end audio signal in accordance with a prede- 
termined second transform format; 

noise suppression means for receiving said near-end fre- 
quency representation and for generating a noise- 
suppressed near-end frequency representation in accor- 
dance with a predetermined noise suppression format; 
and 

detection means for receiving said far-end frequency 
representation and said noise-suppressed near-end fre- 
quency representation and for generating a signal 
indicative of a doubletalk condition in accordance with 
said far-end frequency representation and said noise- 
suppressed near-end frequency representation. 

5. The apparatus of claim 4 wherein said detection means 
comprises: 
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subtraction means for subtracting said far-end frequency 
representation from said noise-suppressed near-end fre- 
quency representation to provide a difference signal; 

energy computation means for determining an energy 
5 value of said difference signal in accordance with a 
predetermined energy computation format; and 

comparison means for comparing said difference signal 
energy value with the predetermined threshold value 
and for selectively providing a signal indicative of a 
l0 doubletalk condition in accordance with said compari- 
son. 

6. The apparatus of claim 5 wherein said far-end fre- 
quency representation comprises frequency components, 
and wherein said detection means further comprises weight- 
ing means for weighting said frequency components of said 

1 far-end frequency representation. 

7. The apparatus of claim 5 wherein said noise suppres- 
sion means generates said noise-suppressed near-end fre- 
quency representation by generating a noise spectrum esti- 
mate of said noise component and subtracting said noise 

20 spectrum estimate from said near-end frequency represen- 
tation. 

8. A method for detecting the existence of a doubletalk 
condition wherein said doubletalk condition exists when 
both near-end and far-end audio signals are present, said 

25 near-end audio signal including an unremoved echo com- 
ponent and a noise component, comprising the steps of: 
transforming said far-end audio signal to a frequency 
representation of said far-end audio signal in accor- 
dance with a predetermined first transform format; 
30 transforming said near-end audio signal to a frequency 
representation of said near-end audio signal in accor- 
dance with a predetermined second transform format; 
suppressing said noise component of said near-end fre- 
quency representation in accordance with a predeter- 
35 mined noise suppression format to generate a noise 
suppressed frequency format; and 
determining the presence of said doubletalk condition in 
accordance with said far-end frequency representation 
and said noise suppressed near-end frequency repre- 
40 sentation. 

9. The method of claim 8 wherein said step of detennining 
comprises the steps of: 

subtracting said far-end frequency representation from 
said noise suppressed near-end frequency representa- 
45 tion to provide a difference signal; 

deterniining an energy value of said difference signal in 
accordance with a predetermined energy computation 
format; and 

5Q comparing said difference signal energy value with a 
redetermined threshold value to selectively provide a 
signal indicative of said doubletalk condition. 

10. The method of claim 9 wherein said far-end frequency 
representation comprises frequency components, further 

55 comprising the step of weighting said frequency compo- 
nents. 

11. The method of claim 8 wherein said step of suppress- 
ing comprises the steps of: 

generating a noise spectrum estimate; and 
6o subtracting said noise spectrum estimate from said near- 
end frequency representation. 

***** 
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